GitHub

您所在的位置:网站首页 notorious mod vrchat free GitHub

GitHub

2024-06-01 01:18| 来源: 网络整理| 查看: 265

TaSTT: A deliciously free STT

TaSTT (pronounced "tasty") is a free speech-to-text tool for VRChat. It uses a GPU-based transcription algorithm to turn your voice into text, then sends it into VRChat via OSC.

To get started, download the latest .zip from the releases page.

Speech-to-text demo

Contents:

Usage and setup Features Requirements Motivation Design overview Contributing Roadmap Backlog

Made with love by yum_food.

Usage and setup

Download the latest .zip from the releases page.

Please join the discord to share feedback and get technical help.

To build your own package from source, see GUI/README.md.

Basic controls:

Short click to toggle transcription. Medium click to hide the text box. Hold to update text box without unlocking from worldspace. Medium click + hold to type using STT. Scale up/down in the radial menu. Design philosophy All language services are performed on the client. No network hops in the critical path. Priorities (descending order): reliability, latency, accuracy, performance, aesthetics. No telemetry of any kind in the app. github and discord are the only means I have to estimate usage and triage bugs. Permissive licensing. Users should be legally entitled to hack, extend, relicense, and profit from this codebase. Features Works with the built-in chatbox (usable with public avatars!) Customizable board resolution, up to ridiculous sizes. Lighweight design: Custom textbox requires as few as 65 parameter bits Transcription doesn't affect VRChat framerate much, since VRC is heavily CPU-bound. Performance impact when not speaking is negligible. Browser source. Use with OBS! Multi-language support. Japanese, Korean, and Chinese glyphs included, among many other languages. Full list of Unicode blocks is defined in generate_fonts.py. Whisper natively supports transcription in 100 languages. Customizable: Control button may be set to left/right a/b/joystick. Text color, background color, and border color are customizable in the shader. Text background may be customized with PBR textures: base color, normal, metallic, roughness, and emission are all implemented. Border width and rounding are customizable. Shader supports physically based shading: smoothness, metallic, and emissive. Many optional quality-of-life features: Audio feedback: hear distinct beeps when transcription starts and stops. May also enable in-game noise indicator, to grab others' attention. Visual transcription indicator. Resize with a blendtree in your radial menu. Locks to world space when done speaking. Privacy-respecting: transcription is done on your GPU, not in the cloud. Hackable. From-scratch implementation. Free as in beer. Free as in freedom. MIT license. Requirements

System requirements:

~2GB disk space NVIDIA GPU with at least 2GB of spare VRAM. You can run it in CPU mode, but it's really slow and lags you a lot more, so I wouldn't recommend it. I've tested on a 1080 Ti and a 3090 and saw comparable latency. SteamVR. No write defaults on your avatar if you're using the custom text box.

Avatar resources used:

Tris: 4 Material slots: 1 Texture memory: 340 KB (English), 130 MB (international) Parameter bits: 65-217 (configurable; more bits == faster paging) Menu slots: 1 Motivation

Many VRChat players choose not to use their mics, but as a practical matter, occasionally have to communicate. I want this to be as simple, efficient, and reliable as possible.

There are existing tools which help here, but they are all imperfect for one reason or another:

RabidCrab's STT costs money and relies on cloud-based transcription. Because of the reliance on cloud-based transcription services, it's typically slower and less reliable than local transcription. The in-game text box is not visible in streamer mode, and limits you to one update every ~2 seconds, making it a poor choice for latency-sensitive communication. KillFrenzy's AvatarText only supports text-to-text. It's an excellent product with high-quality source code, but it lacks integration with a client-side STT engine. I5UCC's VRCTextboxSTT makes KillFrenzy's AvatarText and Whisper kiss. It's the closest spiritual cousin to this repository. The author has made incredible sustained progress on the problem. Definitely take a look! VRCWizard's TTS-Voice-Wizard also uses Whisper, but they rely on the C# interface to Const-Me's CUDA-enabled Whisper implementation. This implementation does not support beam search decoding and waits for pauses to segment your voice. Thus it's less accurate and higher latency than this project's Python-based transcription engine, but it's more performant. It supports more feature (like cloud-based TTS), so you might want to check it out.

Why should you pick this project over the alternatives? This project has the lowest latency (measured



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3